Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Genome Res ; 32(10): 1952-1964, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36109148

RESUMO

We assembled the 9.8-Gbp genome of western redcedar (WRC; Thuja plicata), an ecologically and economically important conifer species of the Cupressaceae. The genome assembly, derived from a uniquely inbred tree produced through five generations of self-fertilization (selfing), was determined to be 86% complete by BUSCO analysis, one of the most complete genome assemblies for a conifer. Population genomic analysis revealed WRC to be one of the most genetically depauperate wild plant species, with an effective population size of approximately 300 and no significant genetic differentiation across its geographic range. Nucleotide diversity, π, is low for a continuous tree species, with many loci showing zero diversity, and the ratio of π at zero- to fourfold degenerate sites is relatively high (approximately 0.33), suggestive of weak purifying selection. Using an array of genetic lines derived from up to five generations of selfing, we explored the relationship between genetic diversity and mating system. Although overall heterozygosity was found to decline faster than expected during selfing, heterozygosity persisted at many loci, and nearly 100 loci were found to deviate from expectations of genetic drift, suggestive of associative overdominance. Nonreference alleles at such loci often harbor deleterious mutations and are rare in natural populations, implying that balanced polymorphisms are maintained by linkage to dominant beneficial alleles. This may account for how WRC remains responsive to natural and artificial selection, despite low genetic diversity.


Assuntos
Traqueófitas , Traqueófitas/genética , Autofertilização/genética , Alelos , Heterozigoto , Polimorfismo Genético , Variação Genética , Seleção Genética
3.
Nat Biotechnol ; 38(11): 1347-1355, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32541955

RESUMO

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by ≥1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.


Assuntos
Mutação em Linhagem Germinativa/genética , Mutação INDEL/genética , Diploide , Variação Estrutural do Genoma , Humanos , Anotação de Sequência Molecular , Análise de Sequência de DNA
4.
Genome Biol Evol ; 12(7): 1174-1179, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32449750

RESUMO

Plant mitochondrial genomes vary widely in size. Although many plant mitochondrial genomes have been sequenced and assembled, the vast majority are of angiosperms, and few are of gymnosperms. Most plant mitochondrial genomes are smaller than a megabase, with a few notable exceptions. We have sequenced and assembled the complete 5.5-Mb mitochondrial genome of Sitka spruce (Picea sitchensis), to date, one of the largest mitochondrial genomes of a gymnosperm. We sequenced the whole genome using Oxford Nanopore MinION, and then identified contigs of mitochondrial origin assembled from these long reads based on sequence homology to the white spruce mitochondrial genome. The assembly graph shows a multipartite genome structure, composed of one smaller 168-kb circular segment of DNA, and a larger 5.4-Mb single component with a branching structure. The assembly graph gives insight into a putative complex physical genome structure, and its branching points may represent active sites of recombination.


Assuntos
Genoma Mitocondrial , Genoma de Planta , Picea/genética , Estrutura Molecular
5.
Microbiol Resour Announc ; 8(24)2019 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-31196920

RESUMO

Engelmann spruce (Picea engelmannii) is a conifer found primarily on the west coast of North America. Here, we present the complete chloroplast genome sequence of Picea engelmannii genotype Se404-851. This chloroplast sequence will benefit future conifer genomic research and contribute resources to further species conservation efforts.

6.
Genes (Basel) ; 10(7)2019 06 26.
Artigo em Inglês | MEDLINE | ID: mdl-31248052

RESUMO

The Steller sea lion is the largest member of the Otariidae family and is found in the coastal waters of the northern Pacific Rim. Here, we present the Steller sea lion genome, determined through DNA sequencing approaches that utilized microfluidic partitioning library construction, as well as nanopore technologies. These methods constructed a highly contiguous assembly with a scaffold N50 length of over 14 megabases, a contig N50 length of over 242 kilobases and a total length of 2.404 gigabases. As a measure of completeness, 95.1% of 4104 highly conserved mammalian genes were found to be complete within the assembly. Further annotation identified 19,668 protein coding genes. The assembled genome sequence and underlying sequence data can be found at the National Center for Biotechnology Information (NCBI) under the BioProject accession number PRJNA475770.


Assuntos
Genoma , Leões-Marinhos/genética , Animais , Biblioteca Genômica , Microfluídica/métodos , Nanoporos , Sequenciamento Completo do Genoma
7.
Microbiol Resour Announc ; 8(23)2019 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-31171622

RESUMO

Here, we present the complete chloroplast genome sequence of white spruce (Picea glauca, genotype WS77111), a coniferous tree widespread in the boreal forests of North America. This sequence contributes to genomic and phylogenetic analyses of the Picea genus that are part of ongoing research to understand their adaptation to environmental stress.

8.
Bioinformatics ; 35(21): 4448-4450, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31004474

RESUMO

SUMMARY: The ORCA bioinformatics environment is a Docker image that contains hundreds of bioinformatics tools and their dependencies. The ORCA image and accompanying server infrastructure provide a comprehensive bioinformatics environment for education and research. The ORCA environment on a server is implemented using Docker containers, but without requiring users to interact directly with Docker, suitable for novices who may not yet have familiarity with managing containers. ORCA has been used successfully to provide a private bioinformatics environment to external collaborators at a large genome institute, for teaching an undergraduate class on bioinformatics targeted at biologists, and to provide a ready-to-go bioinformatics suite for a hackathon. Using ORCA eliminates time that would be spent debugging software installation issues, so that time may be better spent on education and research. AVAILABILITY AND IMPLEMENTATION: The ORCA Docker image is available at https://hub.docker.com/r/bcgsc/orca/. The source code of ORCA is available at https://github.com/bcgsc/orca under the MIT license.


Assuntos
Biologia Computacional , Software , Genoma
9.
Genes (Basel) ; 9(12)2018 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-30513700

RESUMO

The grizzly bear (Ursus arctos ssp. horribilis) represents the largest population of brown bears in North America. Its genome was sequenced using a microfluidic partitioning library construction technique, and these data were supplemented with sequencing from a nanopore-based long read platform. The final assembly was 2.33 Gb with a scaffold N50 of 36.7 Mb, and the genome is of comparable size to that of its close relative the polar bear (2.30 Gb). An analysis using 4104 highly conserved mammalian genes indicated that 96.1% were found to be complete within the assembly. An automated annotation of the genome identified 19,848 protein coding genes. Our study shows that the combination of the two sequencing modalities that we used is sufficient for the construction of highly contiguous reference quality mammalian genomes. The assembled genome sequence and the supporting raw sequence reads are available from the NCBI (National Center for Biotechnology Information) under the bioproject identifier PRJNA493656, and the assembly described in this paper is version QXTK01000000.

10.
BMC Bioinformatics ; 19(1): 393, 2018 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-30367597

RESUMO

BACKGROUND: Genome sequencing yields the sequence of many short snippets of DNA (reads) from a genome. Genome assembly attempts to reconstruct the original genome from which these reads were derived. This task is difficult due to gaps and errors in the sequencing data, repetitive sequence in the underlying genome, and heterozygosity. As a result, assembly errors are common. In the absence of a reference genome, these misassemblies may be identified by comparing the sequencing data to the assembly and looking for discrepancies between the two. Once identified, these misassemblies may be corrected, improving the quality of the assembled sequence. Although tools exist to identify and correct misassemblies using Illumina paired-end and mate-pair sequencing, no such tool yet exists that makes use of the long distance information of the large molecules provided by linked reads, such as those offered by the 10x Genomics Chromium platform. We have developed the tool Tigmint to address this gap. RESULTS: To demonstrate the effectiveness of Tigmint, we applied it to assemblies of a human genome using short reads assembled with ABySS 2.0 and other assemblers. Tigmint reduced the number of misassemblies identified by QUAST in the ABySS assembly by 216 (27%). While scaffolding with ARCS alone more than doubled the scaffold NGA50 of the assembly from 3 to 8 Mbp, the combination of Tigmint and ARCS improved the scaffold NGA50 of the assembly over five-fold to 16.4 Mbp. This notable improvement in contiguity highlights the utility of assembly correction in refining assemblies. We demonstrate the utility of Tigmint in correcting the assemblies of multiple tools, as well as in using Chromium reads to correct and scaffold assemblies of long single-molecule sequencing. CONCLUSIONS: Scaffolding an assembly that has been corrected with Tigmint yields a final assembly that is both more correct and substantially more contiguous than an assembly that has not been corrected. Using single-molecule sequencing in combination with linked reads enables a genome sequence assembly that achieves both a high sequence contiguity as well as high scaffold contiguity, a feat not currently achievable with either technology alone.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Cromossomos Humanos/genética , Genoma Humano , Genômica , Humanos , Nanoporos , Sequências Repetitivas de Ácido Nucleico
11.
BMC Bioinformatics ; 19(1): 234, 2018 06 20.
Artigo em Inglês | MEDLINE | ID: mdl-29925315

RESUMO

BACKGROUND: The long-range sequencing information captured by linked reads, such as those available from 10× Genomics (10xG), helps resolve genome sequence repeats, and yields accurate and contiguous draft genome assemblies. We introduce ARKS, an alignment-free linked read genome scaffolding methodology that uses linked reads to organize genome assemblies further into contiguous drafts. Our approach departs from other read alignment-dependent linked read scaffolders, including our own (ARCS), and uses a kmer-based mapping approach. The kmer mapping strategy has several advantages over read alignment methods, including better usability and faster processing, as it precludes the need for input sequence formatting and draft sequence assembly indexing. The reliance on kmers instead of read alignments for pairing sequences relaxes the workflow requirements, and drastically reduces the run time. RESULTS: Here, we show how linked reads, when used in conjunction with Hi-C data for scaffolding, improve a draft human genome assembly of PacBio long-read data five-fold (baseline vs. ARKS NG50 = 4.6 vs. 23.1 Mbp, respectively). We also demonstrate how the method provides further improvements of a megabase-scale Supernova human genome assembly (NG50 = 14.74 Mbp vs. 25.94 Mbp before and after ARKS), which itself exclusively uses linked read data for assembly, with an execution speed six to nine times faster than competitive linked read scaffolders (~ 10.5 h compared to 75.7 h, on average). Following ARKS scaffolding of a human genome 10xG Supernova assembly (of cell line NA12878), fewer than 9 scaffolds cover each chromosome, except the largest (chromosome 1, n = 13). CONCLUSIONS: ARKS uses a kmer mapping strategy instead of linked read alignments to record and associate the barcode information needed to order and orient draft assembly sequences. The simplified workflow, when compared to that of our initial implementation, ARCS, markedly improves run time performances on experimental human genome datasets. Furthermore, the novel distance estimator in ARKS utilizes barcoding information from linked reads to estimate gap sizes. It accomplishes this by modeling the relationship between known distances of a region within contigs and calculating associated Jaccard indices. ARKS has the potential to provide correct, chromosome-scale genome assemblies, promptly. We expect ARKS to have broad utility in helping refine draft genomes.


Assuntos
Cromossomos Humanos/genética , Genoma Humano , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Humanos
12.
Genome Res ; 27(5): 768-777, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28232478

RESUMO

The assembly of DNA sequences de novo is fundamental to genomics research. It is the first of many steps toward elucidating and characterizing whole genomes. Downstream applications, including analysis of genomic variation between species, between or within individuals critically depend on robustly assembled sequences. In the span of a single decade, the sequence throughput of leading DNA sequencing instruments has increased drastically, and coupled with established and planned large-scale, personalized medicine initiatives to sequence genomes in the thousands and even millions, the development of efficient, scalable and accurate bioinformatics tools for producing high-quality reference draft genomes is timely. With ABySS 1.0, we originally showed that assembling the human genome using short 50-bp sequencing reads was possible by aggregating the half terabyte of compute memory needed over several computers using a standardized message-passing system (MPI). We present here its redesign, which departs from MPI and instead implements algorithms that employ a Bloom filter, a probabilistic data structure, to represent a de Bruijn graph and reduce memory requirements. We benchmarked ABySS 2.0 human genome assembly using a Genome in a Bottle data set of 250-bp Illumina paired-end and 6-kbp mate-pair libraries from a single individual. Our assembly yielded a NG50 (NGA50) scaffold contiguity of 3.5 (3.0) Mbp using <35 GB of RAM. This is a modest memory requirement by today's standards and is often available on a single computer. We also investigate the use of BioNano Genomics and 10x Genomics' Chromium data to further improve the scaffold NG50 (NGA50) of this assembly to 42 (15) Mbp.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genômica/métodos , Software , Mapeamento de Sequências Contíguas/normas , Tamanho do Genoma , Genômica/normas , Humanos , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas
13.
PLoS One ; 11(9): e0163059, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27632164

RESUMO

The linked read sequencing library preparation platform by 10X Genomics produces barcoded sequencing libraries, which are subsequently sequenced using the Illumina short read sequencing technology. In this new approach, long fragments of DNA are partitioned into separate micro-reactions, where the same index sequence is incorporated into each of the sequencing fragment inserts derived from a given long fragment. In this study, we exploited this property by using reads from index sequences associated with a large number of reads, to assemble the chloroplast genome of the Sitka spruce tree (Picea sitchensis). Here we report on the first Sitka spruce chloroplast genome assembled exclusively from P. sitchensis genomic libraries prepared using the 10X Genomics protocol. We show that the resulting 124,049 base pair long genome shares high sequence similarity with the related white spruce and Norway spruce chloroplast genomes, but diverges substantially from a previously published P. sitchensis- P. thunbergii chimeric genome. The use of reads from high-frequency indices enabled separation of the nuclear genome reads from that of the chloroplast, which resulted in the simplification of the de Bruijn graphs used at the various stages of assembly.


Assuntos
Cloroplastos/genética , Genoma de Planta , Picea/genética , Filogenia , Picea/classificação
14.
Genome Biol Evol ; 8(1): 29-41, 2015 Dec 08.
Artigo em Inglês | MEDLINE | ID: mdl-26645680

RESUMO

The genome sequences of the plastid and mitochondrion of white spruce (Picea glauca) were assembled from whole-genome shotgun sequencing data using ABySS. The sequencing data contained reads from both the nuclear and organellar genomes, and reads of the organellar genomes were abundant in the data as each cell harbors hundreds of mitochondria and plastids. Hence, assembly of the 123-kb plastid and 5.9-Mb mitochondrial genomes were accomplished by analyzing data sets primarily representing low coverage of the nuclear genome. The assembled organellar genomes were annotated for their coding genes, ribosomal RNA, and transfer RNA. Transcript abundances of the mitochondrial genes were quantified in three developmental tissues and five mature tissues using data from RNA-seq experiments. C-to-U RNA editing was observed in the majority of mitochondrial genes, and in four genes, editing events were noted to modify ACG codons to create cryptic AUG start codons. The informatics methodology presented in this study should prove useful to assemble organellar genomes of other plant species using whole-genome shotgun sequencing data.


Assuntos
Genoma de Cloroplastos , Genoma Mitocondrial , Genoma de Planta , Picea/genética , Sequência de Bases , Mapeamento de Sequências Contíguas , Anotação de Sequência Molecular , Dados de Sequência Molecular
15.
Int J Genomics ; 2015: 196591, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26539459

RESUMO

De novo assembly of the genome of a species is essential in the absence of a reference genome sequence. Many scalable assembly algorithms use the de Bruijn graph (DBG) paradigm to reconstruct genomes, where a table of subsequences of a certain length is derived from the reads, and their overlaps are analyzed to assemble sequences. Despite longer subsequences unlocking longer genomic features for assembly, associated increase in compute resources limits the practicability of DBG over other assembly archetypes already designed for longer reads. Here, we revisit the DBG paradigm to adapt it to the changing sequencing technology landscape and introduce three data structure designs for spaced seeds in the form of paired subsequences. These data structures address memory and run time constraints imposed by longer reads. We observe that when a fixed distance separates seed pairs, it provides increased sequence specificity with increased gap length. Further, we note that Bloom filters would be suitable to implicitly store spaced seeds and be tolerant to sequencing errors. Building on this concept, we describe a data structure for tracking the frequencies of observed spaced seeds. These data structure designs will have applications in genome, transcriptome and metagenome assemblies, and read error correction.

16.
BMC Med Genomics ; 8 Suppl 3: S1, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26399504

RESUMO

BACKGROUND: Reading the nucleotides from two ends of a DNA fragment is called paired-end tag (PET) sequencing. When the fragment length is longer than the combined read length, there remains a gap of unsequenced nucleotides between read pairs. If the target in such experiments is sequenced at a level to provide redundant coverage, it may be possible to bridge these gaps using bioinformatics methods. Konnector is a local de novo assembly tool that addresses this problem. Here we report on version 2.0 of our tool. RESULTS: Konnector uses a probabilistic and memory-efficient data structure called Bloom filter to represent a k-mer spectrum - all possible sequences of length k in an input file, such as the collection of reads in a PET sequencing experiment. It performs look-ups to this data structure to construct an implicit de Bruijn graph, which describes (k-1) base pair overlaps between adjacent k-mers. It traverses this graph to bridge the gap between a given pair of flanking sequences. CONCLUSIONS: Here we report the performance of Konnector v2.0 on simulated and experimental datasets, and compare it against other tools with similar functionality. We note that, representing k-mers with 1.5 bytes of memory on average, Konnector can scale to very large genomes. With our parallel implementation, it can also process over a billion bases on commodity hardware.


Assuntos
Análise de Sequência de DNA/métodos , Software , Algoritmos , DNA/química , Sequenciamento de Nucleotídeos em Larga Escala
17.
BMC Bioinformatics ; 16: 230, 2015 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-26209068

RESUMO

BACKGROUND: While next-generation sequencing technologies have made sequencing genomes faster and more affordable, deciphering the complete genome sequence of an organism remains a significant bioinformatics challenge, especially for large genomes. Low sequence coverage, repetitive elements and short read length make de novo genome assembly difficult, often resulting in sequence and/or fragment "gaps" - uncharacterized nucleotide (N) stretches of unknown or estimated lengths. Some of these gaps can be closed by re-processing latent information in the raw reads. Even though there are several tools for closing gaps, they do not easily scale up to processing billion base pair genomes. RESULTS: Here we describe Sealer, a tool designed to close gaps within assembly scaffolds by navigating de Bruijn graphs represented by space-efficient Bloom filter data structures. We demonstrate how it scales to successfully close 50.8% and 13.8% of gaps in human (3 Gbp) and white spruce (20 Gbp) draft assemblies in under 30 and 27 h, respectively - a feat that is not possible with other leading tools with the breadth of data used in our study. CONCLUSION: Sealer is an automated finishing application that uses the succinct Bloom filter representation of a de Bruijn graph to close gaps in draft assemblies, including that of very large genomes. We expect Sealer to have broad utility for finishing genomes across the tree of life, from bacterial genomes to large plant genomes and beyond. Sealer is available for download at https://github.com/bcgsc/abyss/tree/sealer-release.


Assuntos
Biologia Computacional/métodos , Interface Usuário-Computador , Algoritmos , Genoma Humano , Genoma de Planta , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Internet , Pinaceae/genética , Análise de Sequência de DNA
18.
PLoS One ; 10(5): e0128026, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26020645

RESUMO

When working on an ongoing genome sequencing and assembly project, it is rather inconvenient when gene identifiers change from one build of the assembly to the next. The gene labelling system described here, UniqTag, addresses this common challenge. UniqTag assigns a unique identifier to each gene that is a representative k-mer, a string of length k, selected from the sequence of that gene. Unlike serial numbers, these identifiers are stable between different assemblies and annotations of the same data without requiring that previous annotations be lifted over by sequence alignment. We assign UniqTag identifiers to ten builds of the Ensembl human genome spanning eight years to demonstrate this stability. The implementation of UniqTag in Ruby and an R package are available at https://github.com/sjackman/uniqtag sjackman/uniqtag. The R package is also available from CRAN: install.packages ("uniqtag"). Supplementary material and code to reproduce it is available at https://github.com/sjackman/uniqtag-paper.


Assuntos
Genoma Humano , Anotação de Sequência Molecular/métodos , Análise de Sequência de DNA/métodos , Software , Humanos
19.
Plant J ; 83(2): 189-212, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-26017574

RESUMO

White spruce (Picea glauca), a gymnosperm tree, has been established as one of the models for conifer genomics. We describe the draft genome assemblies of two white spruce genotypes, PG29 and WS77111, innovative tools for the assembly of very large genomes, and the conifer genomics resources developed in this process. The two white spruce genotypes originate from distant geographic regions of western (PG29) and eastern (WS77111) North America, and represent elite trees in two Canadian tree-breeding programs. We present an update (V3 and V4) for a previously reported PG29 V2 draft genome assembly and introduce a second white spruce genome assembly for genotype WS77111. Assemblies of the PG29 and WS77111 genomes confirm the reconstructed white spruce genome size in the 20 Gbp range, and show broad synteny. Using the PG29 V3 assembly and additional white spruce genomics and transcriptomics resources, we performed MAKER-P annotation and meticulous expert annotation of very large gene families of conifer defense metabolism, the terpene synthases and cytochrome P450s. We also comprehensively annotated the white spruce mevalonate, methylerythritol phosphate and phenylpropanoid pathways. These analyses highlighted the large extent of gene and pseudogene duplications in a conifer genome, in particular for genes of secondary (i.e. specialized) metabolism, and the potential for gain and loss of function for defense and adaptation.


Assuntos
Genoma de Planta , Família Multigênica , Fenóis/metabolismo , Picea/genética , Terpenos/metabolismo , Alquil e Aril Transferases/metabolismo , Biologia Computacional , Sistema Enzimático do Citocromo P-450/metabolismo , Transcriptoma
20.
PLoS One ; 10(4): e0126409, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25923767

RESUMO

One essential application in bioinformatics that is affected by the high-throughput sequencing data deluge is the sequence alignment problem, where nucleotide or amino acid sequences are queried against targets to find regions of close similarity. When queries are too many and/or targets are too large, the alignment process becomes computationally challenging. This is usually addressed by preprocessing techniques, where the queries and/or targets are indexed for easy access while searching for matches. When the target is static, such as in an established reference genome, the cost of indexing is amortized by reusing the generated index. However, when the targets are non-static, such as contigs in the intermediate steps of a de novo assembly process, a new index must be computed for each run. To address such scalability problems, we present DIDA, a novel framework that distributes the indexing and alignment tasks into smaller subtasks over a cluster of compute nodes. It provides a workflow beyond the common practice of embarrassingly parallel implementations. DIDA is a cost-effective, scalable and modular framework for the sequence alignment problem in terms of memory usage and runtime. It can be employed in large-scale alignments to draft genomes and intermediate stages of de novo assembly runs. The DIDA source code, sample files and user manual are available through http://www.bcgsc.ca/platform/bioinfo/software/dida. The software is released under the British Columbia Cancer Agency License (BCCA), and is free for academic use.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Alinhamento de Sequência/métodos , Software , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...